Visual data mining and analysis of software repositories
نویسندگان
چکیده
In this article we describe an ongoing effort to integrate information visualization techniques into the process of configuration management for software systems. Our focus is to help software engineers manage the evolution of large and complex software systems by offering them effective and efficient ways to query and assess system properties using visual techniques. To this end, we combine several techniques from different domains, as follows. First, we construct an infrastructure that allows generic querying and data mining of different types of software repositories such as CVS and Subversion. Using this infrastructure, we construct several models of the software source code evolution at different levels of detail, ranging from project and package up to function and code line. Second, we describe a set of views that allow examining the code evolution models at different levels of detail and from different perspectives. We detail three views: the file view shows changes at line level across many versions of a single, or a few, files. The project view shows changes at file level across entire software projects. The decomposition view shows changes at subsystem level across entire projects. We illustrate how the proposed techniques, which we implemented in a fully operational toolset, have been used to answer non-trivial questions on several real-world, industry-size software projects. Our work is at the crossroads of applied software engineering (SE) and information visualization, as our toolset aims to tightly integrate the methods promoted by the InfoVis field into the SE practice. r 2007 Published by Elsevier Ltd.
منابع مشابه
Mining Software Repositories for Software Change Impact Analysis: A Case Study
Data mining algorithms have been recently applied to software repositories to help on the maintenance of evolving software systems. In the past, information about what classes changed together, obtained by mining software repositories, were used to guide future changes. We use this information to measure the possible impacts of a proposed change. In this paper we propose and compare two approac...
متن کاملMining Container Image Repositories for Software Configuration and Beyond
This paper introduces the idea of mining container image repositories for configuration and other deployment information of software systems. Unlike traditional software repositories (e.g., source code repositories and app stores), image repositories encapsulate the entire execution ecosystem for running target software, including its configurations, dependent libraries and components, and OS-l...
متن کاملA Survey on Mining Software Repositories
This paper presents fundamental concepts, overall process and recent research issues of Mining Software Repositories. The data sources such as source control systems, bug tracking systems or archived communications, data types and techniques used for general MSR problems are also presented. Finally, evaluation approaches, opportunities and challenge issues are given. key words: mining, software...
متن کاملA Proposed Data Mining Methodology and its Application to Industrial Procedures
Data mining is the process of discovering correlations, patterns, trends or relationships by searching through a large amount of data stored in repositories, corporate databases, and data warehouses. Industrial procedures with the help of engineers, managers, and other specialists, comprise a broad field and have many tools and techniques in their problem-solving arsenal. The purpose of this st...
متن کاملUsing Pig as a data preparation language for large-scale mining software repositories studies: An experience report
The Mining Software Repositories (MSR) field analyzes software repository data to uncover knowledge and assist development of ever growing, complex systems. However, existing approaches and platforms for MSR analysis face many challenges when performing large-scale MSR studies. Such approaches and platforms rarely scale easily out of the box. Instead, they often require custom scaling tricks an...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Computers & Graphics
دوره 31 شماره
صفحات -
تاریخ انتشار 2007